Improving Multi-Modal Representations Using Image Dispersion: Why Less is Sometimes More

نویسندگان

  • Douwe Kiela
  • Felix Hill
  • Anna Korhonen
  • Stephen Clark
چکیده

Models that learn semantic representations from both linguistic and perceptual input outperform text-only models in many contexts and better reflect human concept acquisition. However, experiments suggest that while the inclusion of perceptual input improves representations of certain concepts, it degrades the representations of others. We propose an unsupervised method to determine whether to include perceptual input for a concept, and show that it significantly improves the ability of multi-modal models to learn and represent word meanings. The method relies solely on image data, and can be applied to a variety of other NLP tasks.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multi-modal Multi-task Learning for Automatic Dietary Assessment

We investigate the task of automatic dietary assessment: given meal images and descriptions uploaded by real users, our task is to automatically rate the meals and deliver advisory comments for improving users’ diets. To address this practical yet challenging problem, which is multi-modal and multi-task in nature, an end-to-end neural model is proposed. In particular, comprehensive meal represe...

متن کامل

Multi- and Cross-Modal Semantics Beyond Vision: Grounding in Auditory Perception

Multi-modal semantics has relied on feature norms or raw image data for perceptual input. In this paper we examine grounding semantic representations in raw auditory data, using standard evaluations for multi-modal semantics, including measuring conceptual similarity and relatedness. We also evaluate cross-modal mappings, through a zero-shot learning task mapping between linguistic and auditory...

متن کامل

Improving Appearance Model Matching Using Local Image Structure

We show how non-linear representations of local image structure can be used to improve the performance of model matching algorithms in medical image analysis tasks. Rather than represent the image structure using intensity values or gradients, we use measures that indicate the reliability of a set of local image feature detector outputs. These features are image edges, corners, and gradients. F...

متن کامل

Look, Imagine and Match: Improving Textual-Visual Cross-Modal Retrieval with Generative Models

Textual-visual cross-modal retrieval has been a hot research topic in both computer vision and natural language processing communities. Learning appropriate representations for multi-modal data is crucial for the cross-modal retrieval performance. Unlike existing image-text retrieval approaches that embed image-text pairs as single feature vectors in a common representational space, we propose ...

متن کامل

Entropy and Laplacian images: Structural representations for multi-modal registration

The standard approach to multi-modal registration is to apply sophisticated similarity metrics such as mutual information. The disadvantage of these metrics, in comparison to measuring the intensity difference with, e.g. L1 or L2 distance, is the increase in computational complexity and consequently the increase in runtime of the registration. An alternative approach, which has not yet gained m...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014